Learning methods and features for corpus-based phrase break prediction on Thai
نویسندگان
چکیده
This paper presents applications of five famous learning methods for Thai phrase break prediction. Phrase break prediction is particularly important for our Thai text-to-speech synthesizer (TTS), where input Thai text has no word and sentence boundary. The learning methods include a POS sequence model, CART, RIPPER, SLIPPER and neural network. Features proposed for the learning machines can be extracted directly from the input text during real processing. The best method based on the CART model gives 80.14% correct-break, 94.40% juncture-correct, and 2.37% false-break scores. Comparing to our previous models based on C4.5 and RIPPER, the new optimized method achieves almost the best performance.
منابع مشابه
Learning phrase break detection in Thai text-to-speech
One of the crucial problems in developing high quality Thai text-to-speech synthesis is to detect phrase break from Thai texts. Unlike English, Thai has no word boundary delimiter and no punctuation mark at the end of a sentence. It makes the problem more serious. Because when we detect phrase break incorrectly, it is not only producing unnatural speech but also creating the wrong meaning. In t...
متن کاملLearning continuous-valued word representations for phrase break prediction
Phrase break prediction is the first step in modeling prosody for text-to-speech systems (TTS). Traditional methods of phrase break prediction have used discrete linguistic representations (like POS tags, induced POS tags, word-terminal syllables) for modeling these breaks. However these discrete representations suffer from a number of issues such as fixing the number of discrete classes and al...
متن کاملLearning Rules for Chinese Prosodic Phrase Prediction
This paper describes a rule-learning approach towards Chinese prosodic phrase prediction for TTS systems. Firstly, we prepared a speech corpus having about 3000 sentences and manually labelled the sentences with two-level prosodic structure. Secondly, candidate features related to prosodic phrasing and the corresponding prosodic boundary labels are extracted from the corpus text to establish an...
متن کاملCorpus-Based Evaluation of Prosodic Phrase Break Prediction Using nltk_lite’s Chunk Parser to Detect Prosodic Phrase Boundaries in the Aix-MARSEC Corpus of Spoken English
An automatic phrase break prediction system aims to identify prosodic-syntactic boundaries in text which correspond to the way a native speaker might process or chunk that same text as speech. In computational linguistics, Machine Learning from hand-annotated corpus data has become the de-facto standard approach to text annotation problems such as prosodic annotation. This is treated as a class...
متن کاملIncorporating second-order information into two-step major phrase break prediction for Korean
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...
متن کامل